Method and apparatus for generating facial feature

ABSTRACT

Methods and apparatus for generating a facial feature. A specific embodiment of the method includes: acquiring a to-be-recognized face image; inputting the to-be-recognized face image into a first convolutional neural network to generate a feature area image set of the to-be-recognized face image, the first convolutional neural network being used to extract a feature area image from a face image; inputting each feature area image in the feature area image set into a corresponding second convolutional neural network to generate an area facial feature of the feature area image, the second convolutional neural network being used to extract the area facial feature of the corresponding feature area image; and generating a facial feature set of the to-be-recognized face image based on the area facial feature of the each feature area image in the feature area image set.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is related to and claims priority from ChineseApplication No. 201711482448.1, filed on Dec. 29, 2017 and entitled“Method and Apparatus for Generating Facial Feature,” the entiredisclosure of which is hereby incorporated by reference.

TECHNICAL FIELD

Embodiments of the present disclosure relate to the field of computertechnology, specifically to the field of face recognition technology,and more specifically to a method and apparatus for generating a facialfeature.

BACKGROUND

Facial recognition is generally a biometric recognition technology foridentification based on facial feature. The related technologies ofcapturing an image or a video stream containing a face using a videocamera or a camera, automatically detecting and tracking the face in theimage, and then performing recognition on the detected face, arecommonly referred to as facial recognition.

SUMMARY

Embodiments of the present disclosure propose a method and apparatus forgenerating a facial feature.

In a first aspect, the embodiments of the present disclosure provide amethod for generating a facial feature, including: acquiring ato-be-recognized face image; inputting the to-be-recognized face imageinto a first convolutional neural network to generate a feature areaimage set of the to-be-recognized face image, the first convolutionalneural network being used to extract a feature area image from a faceimage; inputting each feature area image in the feature area image setinto a corresponding second convolutional neural network to generate anarea facial feature of the feature area image, the second convolutionalneural network being used to extract the area facial feature of thecorresponding feature area image; and generating a facial feature set ofthe to-be-recognized face image based on the area facial feature of theeach feature area image in the feature area image set.

In some embodiments, before the inputting the to-be-recognized faceimage into a first convolutional neural network, the method includes:annotating a key point on the to-be-recognized face image, wherein aface area where the key point is located is used to represent a facialfeature.

In some embodiments, the inputting the to-be-recognized face image intoa first convolutional neural network to generate a feature area imageset of the to-be-recognized face image, includes: inputting theannotated to-be-recognized face image into the first convolutionalneural network, and extracting the face area where the key point islocated as a feature area to generate the feature area image set of theto-be-recognized face image.

In some embodiments, a spatial transformer network is further includedin the first convolutional neural network for determining a feature areaof the face image; and the inputting the to-be-recognized face imageinto a first convolutional neural network to generate a feature areaimage set of the to-be-recognized face image, includes: inputting theto-be-recognized face image into the spatial transformer network todetermine a feature area of the to-be-recognized face image; andinputting the to-be-recognized face image into the first convolutionalneural network to generate the feature area image set of theto-be-recognized face image based on the determined feature area.

In some embodiments, the first convolutional neural network is obtainedby training by: acquiring a sample face image containing an annotatedkey point; and using the sample face image as an input to train andobtain the first convolutional neural network.

In some embodiments, the second convolutional neural network is obtainedby training by: acquiring a sample feature area image; classifying thesample feature area image based on a feature area displayed by thesample feature area image; and using the sample feature area imagebelonging to a same feature area as an input to train and obtain thesecond convolutional neural network corresponding to the feature area.

In some embodiments, the method further includes: recognizing theto-be-recognized face image based on the facial feature set.

In a second aspect, the embodiments of the present disclosure provide anapparatus for generating a facial feature, including: an acquisitionunit, configured to acquire a to-be-recognized face image; a firstgeneration unit, configure to input the to-be-recognized face image intoa first convolutional neural network to generate a feature area imageset of the to-be-recognized face image, the first convolutional neuralnetwork being used to extract a feature area image from a face image; asecond generation unit, configured to input each feature area image inthe feature area image set into a corresponding second convolutionalneural network to generate an area facial feature of the feature areaimage, the second convolutional neural network being used to extract thearea facial feature of the corresponding feature area image; and a thirdgeneration unit, configured to generate a facial feature set of theto-be-recognized face image based on the area facial feature of the eachfeature area image in the feature area image set.

In some embodiments, the apparatus includes: an annotating unit,configured to annotate a key point on the to-he-recognized face image,wherein a face area where the key point is located is used to representa facial feature.

In some embodiments, the first generation unit is further configured to:input the annotated to-be-recognized face image into the firstconvolutional neural network, and extract the face area where the keypoint is located as a feature area to generate the feature area imageset of the to-be-recognized face image.

In some embodiments, a spatial transformer network is further includedin the first convolutional neural network for determining a feature areaof the face image; and the first generation unit includes: adetermination subunit, configured to input the to-be-recognized faceimage into the spatial transformer network to determine a feature areaof the to-be-recognized face image; and a generation subunit, configuredto input the to-be-recognized face image into the first convolutionalneural network to generate the feature area image set of theto-be-recognized face image based on the determined feature area.

In some embodiments, the first convolutional neural network is obtainedby training by: acquiring a sample face image containing an annotatedkey point; and using the sample face image as an input to train andobtain the first convolutional neural network.

In some embodiments, the second convolutional neural network is obtainedby training by: acquiring a sample feature area image; classifying thesample feature area image based on a feature area displayed by thesample feature area image; and using the sample feature area imagebelonging to a same feature area as an input to train and obtain thesecond convolutional neural network corresponding to the feature area.

In some embodiments, the apparatus further includes: a recognition unit,configured to recognize the to-be-recognized face image based on thefacial feature set.

In a third aspect, the embodiments of the present disclosure provide anelectronic device, including: one or more processors; and a storageapparatus, for storing one or more programs, the one or more programs,when executed by the one or more processors, cause the one or moreprocessors to implement the method according to any one of theembodiments in the first aspect.

In a fourth aspect, the embodiments of the present disclosure provide acomputer readable storage medium, storing a computer program thereon,the program, when executed by a processor, implements the methodaccording to any one of the embodiments in the first aspect.

The method and apparatus for generating a facial feature provided by theembodiments of the present disclosure first may generate a feature areaimage set of the to-be-recognized face image by inputting the acquiredto-be-recognized face image into a first convolutional neural network.The first convolutional neural network may be used to extract thefeature area image from the face image. Then, each feature area image inthe feature area image set may be inputted into a corresponding secondconvolutional neural network to generate an area facial feature of thefeature area image. The second convolutional neural network may be usedto extract the area facial feature of the corresponding feature areaimage. Then, a facial feature set of the to-be-recognized face image maybe generated based on the area facial feature of each feature area imagein the feature area image set. That is, the feature area image setgenerated by the first convolutional neural network may realizeinformation sharing for each second convolutional neural network, whichmay reduce the amount of data, thereby reducing the occupation of memoryresources and helping to improve the generation efficiency.

BRIEF DESCRIPTION OF THE DRAWINGS

After reading detailed descriptions of non-limiting embodiments withreference to the following accompanying drawings, other features,objectives and advantages of the present disclosure will become moreapparent:

FIG. 1 is an architecture diagram of an exemplary system in which thepresent disclosure may be implemented;

FIG. 2 is a flowchart of an embodiment of a method for generating afacial feature according to the present disclosure;

FIG. 3 is a schematic diagram of an application scenario of the methodfor generating a facial feature according to the present disclosure;

FIG. 4 is a schematic structural diagram of an embodiment of anapparatus for generating a facial feature according to the presentdisclosure; and

FIG. 5 is a schematic structural diagram adapted to implement anelectronic device of the embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS

The present disclosure will be further described below in detail incombination with the accompanying drawings and the embodiments. Itshould be appreciated that the specific embodiments described herein aremerely used for explaining the relevant disclosure, rather than limitingthe disclosure. In addition, it should be noted that, for the ease ofdescription, only the parts related to the relevant disclosure are shownin the accompanying drawings.

It should be noted that the embodiments in the present disclosure andthe features in the embodiments may be combined with each other on anon-conflict basis. The present disclosure will be described below indetail with reference to the accompanying drawings and in combinationwith the embodiments.

FIG. 1 shows an architecture of an exemplary system 100 which may beused by a method for generating a facial feature or an apparatus forgenerating a facial feature according to the embodiments of the presentdisclosure.

As shown in FIG. 1, the system architecture 100 may include terminals101, 102 and 103, networks 104 and 106, a server 105 and databaseservers 107 and 108. The network 104 serves as a medium providing acommunication link between the terminals 101, 102 and 103, and theserver 105. The network 106 serves as a medium providing a communicationlink between the server 105 and the database servers 107 and 108. Thenetworks 104 and 106 may include various types of connections, such aswired or wireless transmission links, or optical fibers.

The terminals 101, 102 and 103 may be various electronic devices havingimage acquisition apparatus (e.g., cameras), including but not limitedto, smart phones, tablet computers, e-readers, laptop computers, desktopcomputers and sensors.

In addition, the terminals 101, 102, 103 may also be various electronicdevices having display screens. Thus, a user may interact with theserver 105 via the network 104 using the terminals 101, 102, 103 toreceive or send messages and the like. Various client applications, suchas image browsing applications (such as photo browsers or videoplayers), web browsers, search applications, or instant messengers, maybe installed on the terminals 101, 102, and 103.

The database servers 107, 108 may also be servers providing variousservices, for example, the database servers 107, 108 may store apre-trained second convolutional neural network. Here, the secondconvolutional neural network is used to extract the area facial featureof the corresponding feature area image. The database server 107, 108may input the received feature area image into the second convolutionalneural network to generate the area facial feature of the feature areaimage.

The server 105 may also be a server providing various services, forexample, a backend server that provides support for various applicationsdisplayed on the terminals 101, 102, 103. The backend server may performanalyzing and processing on a to-be-recognized face image sent by theterminals 101, 102, 103, input the to-be-recognized face image into apre-trained and stored first convolutional neural network, and send theprocessing result (the generated feature area image set) to theterminals 101, 102, 103. At the same time, the backend server maygenerate a facial feature set based on the area facial feature generatedby the database servers 107, 108, and send the facial feature set to theterminals 101, 102, 103.

It should be noted that the method for generating a facial featureaccording to the embodiments of the present disclosure is generallyexecuted by the server 105. Accordingly, an apparatus for generating afacial feature is generally installed on the server 105.

It should be noted that when the server 105 has the functions of thedatabase servers 107, 108, the system architecture 100 may not includethe database servers 107, 108.

It should be appreciated that the numbers of the terminals, thenetworks, the servers, and the database servers in FIG. 1 are merelyillustrative. Any number of terminals, networks, servers and databaseservers may be provided based on the actual requirements.

With further reference to FIG. 2, a flow 200 of an embodiment of themethod for generating a facial feature according to the presentdisclosure is illustrated. The method for generating a facial featuremay include the following steps.

Step 201, acquiring a to-be-recognized face image.

In the present embodiment, the electronic device (e.g., the server 105as shown in FIG. 1) on which the method for generating a facial featureis performed may acquire a to-be-recognized face image from a terminalcommunicatively connected thereto (for example, the terminals 101, 102,103 as shown in FIG. 1) through a wired connection or a wirelessconnection. Here, the to-be-recognized face image may be an imagecontaining a face image.

Here, the face image in the to-be-recognized face image may be an imageof a partial face (i.e., a face with incomplete face information, suchas a side face or a face partially obscured by clothing or jewelry); ormay be an image of a whole face (i.e., a face with complete faceinformation, such as a front face without being obscured). In addition,the to-be-recognized face image may be a color image or a grayscaleimage. The specific format of the to-be-recognized face image is notlimited in the present disclosure, such as jpg (Joint Photo graphicExperts Group, which is a picture format), BMP (Bitmap, which is animage file format) or RAW (RAW Image Format, which is a losslesscompression format), and may be recognized by the electronic devices.

Step 202, inputting the to-be-recognized face image into a firstconvolutional neural network to generate a feature area image set of theto-be-recognized face image.

In the present embodiment, for the to-be-recognized face image acquiredin step 201, the electronic device may input the to-be-recognized faceimage into the first convolutional neural network to generate a featurearea image of the to-be-recognized face image. Here, the firstconvolutional neural network may be used to extract the feature areaimage from the face image. The feature area image may be an image forcharacterizing a feature of an area of a face, such as a left eye areaimage, or a forehead area image.

In some alternative implementations of the present embodiment, beforeinputting the to-be-recognized face image into the first convolutionalneural network, the electronic device may first annotate a key point onthe to-be-recognized face image. Here, a face area where the key pointis located is used to represent a facial feature. Then, the electronicdevice may input the annotated to-be-recognized face image into thefirst convolutional neural network, extract the face area where the keypoint is located as a feature area to generate the feature area imageset of the to-be-recognized face image.

For example, the electronic device may receive a annotating operationinstruction sent by the user using the terminal. Here, the annotatingoperation instruction may include location of the key point. At thistime, the electronic device may perform key point annotating on theto-be-recognized face image based on the location in the annotatingoperation instruction, and input the annotated to-be-recognized faceimage into the first convolutional neural network.

For another example, the electronic device may first detect the facialfeature (such as five sense organs, the facial contour) of theto-be-recognized face image. Then, based on the detection result, keypoints may be annotated for different features. As an example, for amouth feature, four key points may be annotated, respectively located attwo mouth corners, the center edge of the upper lip and the center edgeof the lower lip. This can reduce the manual participation process andhelp improve processing efficiency. It may be understood that theelectronic device may implement key point annotating by using a facefeature point localization method commonly used in the prior art, suchas a local-based method (CLM, Constrained Local Model) or a global-basedmethod.

It should be noted that other information such as the size, shape andcolor of the key point is not limited in the present disclosure. Theinformation may be preset or may be included in the above annotatingoperation instruction.

Then, face detection and key point detection may be first performedusing the first convolutional neural network. Then, based on thelocation of the key point on the to-be-recognized face image, thecropping location and size may be determined, so that a to-be-croppedface area containing the key point may be used as the feature area. Forexample, the four key points of the mouth may be respectively used asvertices of a quadrilateral or points on four sides of a quadrilateral,thereby obtaining the to-be-cropped feature area. Finally, based on thedetermined feature area, the image on the feature layer obtained bymapping the to-be-recognized face image through processing such asconvolution may be cropped, to generate a feature area image set of theto-be-recognized face image. It should be noted that the shape of thefeature area is not limited in the present disclosure.

Alternatively, in order to simplify the processing, the facial feature(such as five sense organs, the facial contour) of the to-be-recognizedface image may be detected using the first convolutional neural network.Then, based on the detection result, the cropping location and size ofdifferent features are determined, such that the to-be-cropped face areacontaining the feature of the face may be used as the feature area.Further, the feature area image set of the to-be-recognized face imagemay be generated. This also helps to increase the scope of applicationof the first convolutional neural network.

Further, in order to improve the accuracy of the generation result, aspatial transformer network (STN) may be further set in the firstconvolutional neural network for determining a feature area of the faceimage. In this case, the electronic device may input theto-be-recognized face image into the spatial transformer network todetermine a feature area of the to-be-recognized face image. In thisway, for the inputted to-be-recognized face image, the firstconvolutional neural network may extract the image matching the featurearea on the feature layer based on the feature area determined by thespatial transformer network, to generate the feature area image set ofthe to-be-recognized face image.

It should be noted that the specific setting location of the spatialtransformer network in the first convolutional neural network is notlimited in the present disclosure. The spatial transformer network maydetermine feature areas of different features of different face imagesby continuously learning. At the same time, during the processing suchas convolution on the inputted face image by the first convolutionalneural network, the spatial transformer network may learn which featuredata is relatively important as a reference guide. Thus, for an inputtedface image, the spatial transformer network may first determine aninitial feature area, and may perform feedback adjustment through theloss function until the value of the loss function tends to be stable orreaches a preset threshold (e.g., 0.5, which may be set based on actualconditions) after the first convolutional neural network (which has beentrained) generates an initial feature area image set. At this point, itmay be considered that learning of the spatial transformer network ends.That is to say, the first convolutional neural network (excluding thespatial transformer network) may be trained first, and then the networkincluding the spatial transformer network is initialized using theobtained parameters after training.

In addition, it may be understood that, the spatial transformer networkcan not only guide the first convolutional neural network to performfeature area cropping, but also perform simple spatial transformationsuch as translation, rotation, or scaling on the inputted data. This notonly helps to improve the processing effect of the first convolutionalneural network, but also reduces the requirements on theto-be-recognized face image, and improves the scope of application.

In some alternative implementations of the present embodiment, the firstconvolutional neural network may be obtained by training by thefollowing steps: first, a sample face image containing an annotated keypoint may be acquired. Then, the sample face image may be used as aninput to train and obtain the first convolutional neural network.

As an example, a sample face image containing an annotated key point anda corresponding sample feature area image may be acquired, the sampleface image is used as an input, and the sample feature area image isused as an output. In this case, the first convolutional neural networkmay perform face detection and key point detection on the inputtedsample face image, and acquire location of the face and the key point.Then, for locations of corresponding key points, feature area images ofdifferent locations and sizes may be generated. Next, the generatedfeature area images may be matched with the corresponding sample featurearea image. In this way, parameters of the first convolutional neuralnetwork may be continuously adjusted due to feedback based on thematching result.

Alternatively, in order to improve the function of the firstconvolutional neural network and expand the scope of application thefirst convolutional neural network, the first convolutional neuralnetwork may also be obtained by training by the following steps: first,the sample face image (no key points are annotated) may be obtained;then, the sample face image may be used as an input to train and obtainthe first convolutional neural network.

As an example, a sample face image without an annotated key point and acorresponding sample feature area image may be acquired, the sample faceimage is used as an input, and the sample feature area image is used asan output to train and obtain the first convolutional neural network.

Step 203, inputting each feature area image in the feature area imageset into a corresponding second convolutional neural network to generatean area facial feature of the feature area image.

In the present embodiment, for the feature area image set generated instep 202, the electronic device may input each feature area image in thefeature area image set into a corresponding second convolutional neuralnetwork to generate the area facial feature of each feature area image.Here, the second convolutional neural network may be used to extract thearea facial feature of the corresponding feature area image. Forexample, a feature area image of the nose is inputted to the secondconvolutional neural network for processing a nose image. Here, the areafacial feature may be a facial feature for describing an area face imagein the feature area image, such as a facial feature of the nose area.

In some alternative implementations of the present embodiment, thesecond convolutional neural network may be obtained by training by thefollowing steps: first, a sample feature area image may be acquired.Then, the sample feature area image may be classified based on a featurearea (such as nose, mouth) displayed by the sample feature area image.Next, the sample feature area image belonging to the same feature areamay be used as an input to train and obtain the second convolutionalneural network corresponding to the feature area. It should be notedthat the acquired sample feature area image may be, but is not limitedto, generated by the first convolutional neural network.

As an example, a sample feature area image and a sample facial feature(a facial feature for describing an area face image in the correspondingsample feature area image) corresponding to the sample feature areaimage may be acquired. Then, the sample feature area image belonging tothe same feature area may be used as an input, and the correspondingsample facial feature may be used as an output to train and obtain thesecond convolutional neural network corresponding to the feature area.

Alternatively, the second convolutional neural network may also beobtained by training by the following steps: first, a sample face imageand a sample face area feature for describing a different area of theface of the sample face image may be acquired. Then, the sample faceimage may be used as an input, and different sample face area featuresare respectively used as an output to train and obtain secondconvolutional neural networks corresponding to the face image ofdifferent areas. This helps to reduce the number of sample face images.

Step 204, generating a facial feature set of the to-be-recognized faceimage based on the area facial feature of the each feature area image inthe feature area image set.

In the present embodiment, based on the area facial feature of eachfeature area image in the feature area image set generated in step 203,the electronic device may generate a facial feature set of theto-be-recognized face image. Here, the facial feature in the facialfeature set may be used to describe the facial feature of theto-be-recognized face image.

As an example, the electronic device may perform de-duplicationprocessing on area facial features of feature area images. Area facialfeatures after the de-duplication processing may be stored, therebygenerating a facial feature set of the to-be-recognized face image.

For example, the electronic device may also filter the area facialfeatures of feature area images, thereby removing the facial featuresthat are common to most of the face images, so as to select out theunique facial features of the to-be-recognized face image (for example,a facial feature for describing a scar on the face). The selected facialfeatures are stored to generate a facial feature set of theto-be-recognized face image.

In some alternative implementations of the present embodiment, theelectronic device may also perform face detection and/or facerecognition on the to-be-recognized face image based on the generatedfacial feature set of the to-be-recognized face image.

Alternatively, the electronic device may also output the facial featureset of the to-be-recognized face image. The output form here is notlimited in the present disclosure, and may be in the form of text,audio, or image. In addition, the output here may also be outputstorage, such as stored locally, or stored on an external medium ordevice.

It may be understood that the first convolutional neural network and thesecond convolutional neural network in the present embodiment may bemutually independent convolutional neural networks, or may be differentparts in the same convolutional neural network, which is not limited inthe present disclosure.

In addition, the first convolutional neural network and the secondconvolutional neural network may be independently trained, such as thetraining process described above, or may be trained together through aseries of associated trainings. For example, a sample face imagecontaining an annotated key point is inputted into the firstconvolutional neural network to obtain sample feature area images; andthe obtained sample feature area images are respectively inputted intothe corresponding second convolutional neural network to obtain samplefacial features of the sample feature area images; then the samplefacial features are used to identity or detect the sample face. Thus,based on the identification or detection result, the parameters of thefirst convolutional neural network and the second convolutional neuralnetwork may be adjusted and retrained. This will help to simplify thetraining process and may reduce the number of samples. At the same time,it helps to improve the correlation between the first convolutionalneural network and the second convolutional neural network, and improvethe accuracy of the generated result.

The method for generating a facial feature provided by the presentembodiment first may generate a feature area image set of theto-be-recognized face image by inputting the acquired to-be-recognizedface image into a first convolutional neural network. The firstconvolutional neural network may be used to extract the feature areaimage from the face image. Then, each feature area image in the featurearea image set may be inputted into a corresponding second convolutionalneural network to generate an area facial feature of the feature areaimage. The second convolutional neural network may be used to extractthe area facial feature of the corresponding feature area image. Then, afacial feature set of the to-be-recognized face image may be generatedbased on the area facial feature of each feature area image in thefeature area image set. That is, the feature area image set generated bythe first convolutional neural network may realize information sharingfor each second convolutional neural network, which may reduce theamount of data, thereby reducing the occupation of memory resources andhelping to improve the generation efficiency.

With further reference to FIG. 3, a schematic diagram of an applicationscenario of the method for generating a facial feature according to thepresent embodiment is illustrated. In the application scenario of FIG.3, the user may use the camera installed on the terminal 31 to collectthe to-be-recognized face image 311, and the to-be-recognized face image311 may be sent to the server 32 through the terminal 31. Afterreceiving the to-be-recognized face image 311, the server 32 may firstinput the to-be-recognized face image 311 into the pre-stored firstconvolutional neural network 321 to generate a feature area image set.Then, each feature area image in the feature area image set may berespectively inputted into the corresponding pre-stored secondconvolutional neural network 322, thereby generating an area facialfeature of each feature area image. Then, the server 32 may generate afacial feature set 323 of the to-be-recognized face image 311 based onthe area facial features, and send the facial feature set 323 to theterminal 31.

Further, the server 32 may further perform facial recognition on theto-be-recognized face image 311 through the generated facial feature set323, and may send the recognition result to the terminal 31 (not shownin FIG. 3).

With further reference to FIG. 4, as an implementation to the methodshown in the above figures, the present disclosure provides anembodiment of an apparatus for generating a facial feature. Theapparatus embodiment corresponds to the method embodiment shown in FIG.2, and the apparatus may specifically be applied to various electronicdevices.

As shown in FIG. 4, the apparatus 400 for generating a facial feature ofthe present embodiment may include: an acquisition unit 401, configuredto acquire a to-be-recognized face image; a first generation unit 402,configure to input the to-be-recognized face image into a firstconvolutional neural network to generate a feature area image set of theto-be-recognized face image, the first convolutional neural networkbeing used to extract a feature area image from a face image; a secondgeneration unit 403, configured to input each feature area image in thefeature area image set into a corresponding second convolutional neuralnetwork to generate an area facial feature of the feature area image,the second convolutional neural network being used to extract the areafacial feature of the corresponding feature area image; and a thirdgeneration unit 404, configured to generate a facial feature set of theto-be-recognized face image based on the area facial feature of the eachfeature area image in the feature area image set.

In the present embodiment, the specific implementation of theacquisition unit 401, the first generation unit 402, the secondgeneration unit 403, and the third generation unit 404, and thebeneficial effects generated thereof may be referred to the relateddescriptions of step 201, step 202, step 203, and step 204 in theembodiment shown in FIG. 2, respectively, and the detailed descriptionthereof will be omitted.

In some alternative implementations of the present embodiment, theapparatus 400 may include: an annotating unit (not shown in the figure),configured to annotate a key point on the to-be-recognized face image,wherein a face area where the key point is located is used to representa facial feature.

Alternatively, the first generation unit 402 may be further configuredto: input the annotated to-be-recognized face image into the firstconvolutional neural network, and extract the face area where the keypoint is located as a feature area to generate the feature area imageset of the to-be-recognized face image.

As an example, a spatial transformer network may further be included inthe first convolutional neural network for determining a feature area ofthe face image; and the first generation unit 402 may include: adetermination subunit (not shown in the figure), configured to input theto-be-recognized face image into the spatial transformer network todetermine a feature area of the to-be-recognized face image; and ageneration subunit (not shown in the figure), configured to input theto-be-recognized face image into the first convolutional neural networkto generate the feature area image set of the to-be-recognized faceimage based on the determined feature area.

In some embodiments, the first convolutional neural network may beobtained by training by: acquiring a sample face image containing anannotated key point; and using the sample face image as an input totrain and obtain the first convolutional neural network.

Further, the second convolutional neural network may be obtained bytraining by: acquiring a sample feature area image; classifying thesample feature area image based on a feature area displayed by thesample feature area image; and using the sample feature area imagebelonging to a same feature area as an input to train and obtain thesecond convolutional neural network corresponding to the feature area.

In some embodiments, the apparatus 400 may further include: arecognition unit (not shown in the figure), configured to recognize theto-be-recognized face image based on the facial feature set.

Referring to FIG. 5, which is a schematic structural diagram of acomputer system 500 adapted to implement an electronic device accordingto embodiments of the present disclosure. The electronic device shown inFIG. 5 is merely an example, and should not bring any limitations to thefunctions and the scope of use of the embodiments of the presentdisclosure.

As shown in FIG. 5, the computer system 500 includes a centralprocessing unit (CPU) 501, which may execute various appropriate actionsand processes in accordance with a program stored in a read-only memory(ROM) 502 or a program loaded into a random access memory (RAM) 503 froma storage portion 508. The RAM 503 also stores various programs and datarequired by operations of the system 500. The CPU 501, the ROM 502 andthe RAM 503 are connected to each other through a bus 504. Aninput/output (I/O) interface 505 is also connected to the bus 504.

The following components are connected to the I/O interface 505: aninput portion 506 including a keyboard, a mouse etc.; an output portion507 comprising a cathode ray tube (CRT), a liquid crystal display device(LCD), a speaker etc.; a storage portion 508 including a hard disk andthe like; and a communication portion 509 comprising a network interfacecard, such as a LAN card and a modem. The communication portion 509performs communication processes via a network, such as the Internet. Adriver 510 is also connected to the I/O interface 505 as required. Aremovable medium 511, such as a magnetic disk, an optical disk, amagneto-optical disk, and a semiconductor memory, may be installed onthe driver 510, to facilitate the retrieval of a computer program fromthe removable medium 511, and the installation thereof on the storageportion 508 as needed.

In particular, according to embodiments of the present disclosure, theprocess described above with reference to the flow chart may beimplemented in a computer software program. For example, an embodimentof the present disclosure includes a computer program product, whichcomprises a computer program that is tangibly embedded in amachine-readable medium. The computer program comprises program codesfor executing the method as illustrated in the flow chart. In such anembodiment, the computer program may be downloaded and installed from anetwork via the communication portion 509, and/or may be installed fromthe removable media 511. The computer program, when executed by thecentral processing unit (CPU) 501, implements the above mentionedfunctionalities as defined by the methods of the present disclosure. Itshould be noted that the computer readable medium in the presentdisclosure may be computer readable signal medium or computer readablestorage medium or any combination of the above two. An example of thecomputer readable storage medium may include, but not limited to:electric, magnetic, optical, electromagnetic, infrared, or semiconductorsystems, apparatus, elements, or a combination any of the above. A morespecific example of the computer readable storage medium may include butis not limited to: electrical connection with one or more wire, aportable computer disk, a hard disk, a random access memory (RAM), aread only memory (ROM), an erasable programmable read only memory (EPROMor flash memory), a fiber, a portable compact disk read only memory(CD-ROM), an optical memory, a magnet memory or any suitable combinationof the above. In the present disclosure, the computer readable storagemedium may be any physical medium containing or storing programs whichcan be used by a command execution system, apparatus or element orincorporated thereto. In the present disclosure, the computer readablesignal medium may include data signal in the base band or propagating asparts of a carrier, in which computer readable program codes arecarried. The propagating signal may take various forms, including butnot limited to: an electromagnetic signal, an optical signal or anysuitable combination of the above. The signal medium that can be read bycomputer may be any computer readable medium except for the computerreadable storage medium. The computer readable medium is capable oftransmitting, propagating or transferring programs for use by, or usedin combination with, a command execution system, apparatus or element.The program codes contained on the computer readable medium may betransmitted with any suitable medium including but not limited to:wireless, wired, optical cable, RF medium etc., or any suitablecombination of the above.

The flow charts and block diagrams in the accompanying drawingsillustrate architectures, functions and operations that may beimplemented according to the systems, methods and computer programproducts of the various embodiments of the present disclosure. In thisregard, each of the blocks in the flow charts or block diagrams mayrepresent a module, a program segment, or a code portion, said module,program segment, or code portion comprising one or more executableinstructions for implementing specified logic functions. It should alsobe noted that, in some alternative implementations, the functionsdenoted by the blocks may occur in a sequence different from thesequences shown in the figures. For example, any two blocks presented insuccession may be executed, substantially in parallel, or they maysometimes be in a reverse sequence, depending on the function involved.It should also be noted that each block in the block diagrams and/orflow charts as well as a combination of blocks may be implemented usinga dedicated hardware-based system executing specified functions oroperations, or by a combination of a dedicated hardware and computerinstructions.

The units involved in the embodiments of the present disclosure may beimplemented by means of software or hardware. The described units mayalso be provided in a processor, for example, described as: a processor,comprising an acquisition unit, a first generation unit, a secondgeneration unit, and a third generation unit, where the names of theseunits do not in some cases constitute a limitation to such unitsthemselves. For example, the acquisition unit may also be described as“a unit for acquiring a to-be-recognized face image.”

In another aspect, the present disclosure further provides acomputer-readable storage medium. The computer-readable storage mediummay be the computer storage medium included in the electronic device inthe above described embodiments, or a stand-alone computer-readablestorage medium not assembled into the electronic device. Thecomputer-readable storage medium stores one or more programs. The one ormore programs, when executed by the electronic device, cause theelectronic device to: acquiring a to-be-recognized face image; inputtingthe to-be-recognized face image into a first convolutional neuralnetwork to generate a feature area image set of the to-be-recognizedface image, the first convolutional neural network being used to extracta feature area image from a face image; inputting each feature areaimage in the feature area image set into a corresponding secondconvolutional neural network to generate an area facial feature of thefeature area image, the second convolutional neural network being usedto extract the area facial feature of the corresponding feature areaimage; and generating a facial feature set of the to-be-recognized faceimage based on the area facial feature of the each feature area image inthe feature area image set.

The above description only provides an explanation of the preferredembodiments of the present disclosure and the technical principles used.It should be appreciated by those skilled in the art that the inventivescope of the present disclosure is not limited to the technicalsolutions formed by the particular combinations of the above-describedtechnical features. The inventive scope should also cover othertechnical solutions formed by any combinations of the above-describedtechnical features or equivalent features thereof without departing fromthe concept of the disclosure. Technical schemes formed by theabove-described features being interchanged with, but not limited to,technical features with similar functions disclosed in the presentdisclosure are examples.

What is claimed is:
 1. A method for generating a facial feature, themethod comprising: acquiring a to-be-recognized face image; annotating aplurality of key points on the to-be-recognized face image, theplurality of the key points being used to determine a cropping locationand a cropping size of a facial part; inputting the annotatedto-be-recognized face image into a first convolutional neural network togenerate a feature area image set of the annotated to-be-recognized faceimage, the first convolutional neural network being used to extract afeature area image from a face image, the feature area image setcomprising a plurality of feature area images corresponding torespective organs; inputting each of the plurality of feature areaimages in the feature area image set into a corresponding secondconvolutional neural network to generate an area facial feature of acorresponding feature area image, the second convolutional neuralnetwork being used to extract the area facial feature of thecorresponding feature area image, different second neural networkscorresponding to different organs; and generating a facial feature setof the to-be-recognized face image based on the area facial feature ofthe each of the plurality of feature area images in the feature areaimage set.
 2. The method according to claim 1, wherein a face area wherea key point is located is used to represent a facial feature.
 3. Themethod according to claim 2, wherein the inputting the to-be-recognizedface image into a first convolutional neural network to generate afeature area image set of the to-be-recognized face image, comprises:inputting the annotated to-be-recognized face image into the firstconvolutional neural network, and extracting the face area where the keypoint is located as a feature area to generate the feature area imageset of the to-be-recognized face image.
 4. The method according to claim1, wherein the first convolutional neural network is obtained bytraining: acquiring a sample face image containing an annotated keypoint; and using the sample face image as an input to train and obtainthe first convolutional neural network.
 5. The method according to claim1, wherein the second convolutional neural network is obtained bytraining: acquiring a sample feature area image; classifying the samplefeature area image based on a feature area displayed by the samplefeature area image; and using the sample feature area image belonging toa same feature area as an input to train and obtain the secondconvolutional neural network corresponding to the feature area.
 6. Themethod according to claim 1, the method further comprising: recognizingthe to-be-recognized face image based on the facial feature set.
 7. Anapparatus for generating a facial feature, the apparatus comprising: atleast one processor; and a memory storing instructions, the instructionswhen executed by the at least one processor, cause the at least oneprocessor to perform operations, the operations comprising: acquiring ato-be-recognized face image; annotating a plurality of key points on theto-be-recognized face image, the plurality of the key points being usedto determine a cropping location and a cropping size of a facial part;inputting the annotated to-be-recognized face image into a firstconvolutional neural network to generate a feature area image set of theannotated to-be-recognized face image, the first convolutional neuralnetwork being used to extract a feature area image from a face image,the feature area image set comprising a plurality of feature area imagescorresponding to respective organs; inputting each of the plurality offeature area images in the feature area image set into a correspondingsecond convolutional neural network to generate an area facial featureof a corresponding feature area image, the second convolutional neuralnetwork being used to extract the area facial feature of thecorresponding feature area image, different second neural networkscorresponding to different organs; and generating a facial feature setof the to-be-recognized face image based on the area facial feature ofthe each of the plurality of feature area images in the feature areaimage set.
 8. The apparatus according to claim 7, wherein a face areawhere the key point is located is used to represent a facial feature. 9.The apparatus according to claim 8, wherein the inputting theto-be-recognized face image into a first convolutional neural network togenerate a feature area image set of the to-be-recognized face image,comprises: inputting the annotated to-be-recognized face image into thefirst convolutional neural network, and extracting the face area wherethe key point is located as a feature area to generate the feature areaimage set of the to-be-recognized face image.
 10. The apparatusaccording to claim 7, wherein the first convolutional neural network isobtained by training: acquiring a sample face image containing anannotated key point; and using the sample face image as an input totrain and obtain the first convolutional neural network.
 11. Theapparatus according to claim 7, wherein the second convolutional neuralnetwork is obtained by training: acquiring a sample feature area image;classifying the sample feature area image based on a feature areadisplayed by the sample feature area image; and using the sample featurearea image belonging to a same feature area as an input to train andobtain the second convolutional neural network corresponding to thefeature area.
 12. The apparatus according to claim 7, the operationsfurther comprising: recognizing the to-be-recognized face image based onthe facial feature set.
 13. A non-transitory computer storage mediumstoring a computer program, the computer program when executed by one ormore processors, causes the one or more processors to performoperations, the operations comprising: acquiring a to-be-recognized faceimage; annotating a plurality of key points on the to-be-recognized faceimage, the plurality of the key points being used to determine acropping location and a cropping size of a facial part; inputting theannotated to-be-recognized face image into a first convolutional neuralnetwork to generate a feature area image set of the annotatedto-be-recognized face image, the first convolutional neural networkbeing used to extract a feature area image from a face image, thefeature area image set comprising a plurality of feature area imagescorresponding to respective organs; inputting each of the plurality offeature area images in the feature area image set into a correspondingsecond convolutional neural network to generate an area facial featureof a corresponding feature area image, the second convolutional neuralnetwork being used to extract the area facial feature of thecorresponding feature area image, different second neural networkscorresponding to different organs; and generating a facial feature setof the to-be-recognized face image based on the area facial feature ofthe each of the plurality of feature area images in the feature areaimage set.
 14. The method according to claim 1, wherein a spatialtransformer network is further included in the first convolutionalneural network for determining a feature area of the face image; and theinputting the to-be-recognized face image into a first convolutionalneural network to generate a feature area image set of theto-be-recognized face image, comprises: inputting the to-be-recognizedface image into the spatial transformer network to determine a featurearea of the to-be-recognized face image; and inputting theto-be-recognized face image into the first convolutional neural networkto generate the feature area image set of the to-be-recognized faceimage based on the determined feature area.
 15. The apparatus accordingto claim 7, wherein a spatial transformer network is further included inthe first convolutional neural network for determining a feature area ofthe face image; and the inputting the to-be-recognized face image into afirst convolutional neural network to generate a feature area image setof the to-be-recognized face image, comprises: inputting theto-be-recognized face image into the spatial transformer network todetermine a feature area of the to-be-recognized face image; andinputting the to-be-recognized face image into the first convolutionalneural network to generate the feature area image set of theto-be-recognized face image based on the determined feature area.